Mixing Domain Rules with Machine Learning for Radiology Text Classification
نویسندگان
چکیده
In this work we compare and contrast proposed techniques and experiments in the subdomain of radiology text classification. In the past, systems have relied on two main approaches: rule-based or natural language processing plus machine learning (NLP/ML). Simple rule-based approaches have demonstrated great efficacy, at the high cost of requiring medical professionals to build and maintain. Complex NLP pipeline and ML approaches have also demonstrated efficacy, but rely heavily on computing professionals to build and maintain, in addition to requiring the involvement of medical domain experts. We propose a hybrid classification mechanism for radiology text combining rules and NLP/ML that in our trials surpasses existing rule-based and NLP/ML techniques for the task of classifying incidental findings. We evaluate our approach against three approaches from related work using our own gold standard data set of 661 records. Our hybrid approach achieves a 13% F-measure gain over our prior rulebased approach[16] and a 4% F-measure improvement vs. a manual classification process in a hospital setting.
منابع مشابه
Image Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کاملText Classification Rule Induction in the Presence of Domain-Specific Expression Forms
We describe a new method for learning text-classification rules from examples. The text consists of messages written by students in an online learning environment, and it may contain ungrammatical expressions as well as specialized expressions such as formulas. The method is based on the version-space machine learning technique. Experiments show that our method successfully generalizes over cer...
متن کاملImproving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA
With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...
متن کاملارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متنکاوی در حوزه یادگیری الکترونیکی
As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...
متن کاملA Modular System for Rule-based Text Categorisation
We introduce a modular rule-based approach to text categorisation which is more flexible and less time consuming to build than a standard rule-based system because it works with a hierarchical structure and allows for reusability of rules. When compared to currently more wide-spread machine learning models on a case study, our modular system shows competitive results, and it has the advantage o...
متن کامل